Search CORE

47 research outputs found

The Circle of Meaning: From Translation to Paraphrasing and Back

Author: Madnani Nitin
Publication venue
Publication date: 01/01/2010
Field of study

The preservation of meaning between inputs and outputs is perhaps the most ambitious and, often, the most elusive goal of systems that attempt to process natural language. Nowhere is this goal of more obvious importance than for the tasks of machine translation and paraphrase generation. Preserving meaning between the input and the output is paramount for both, the monolingual vs bilingual distinction notwithstanding. In this thesis, I present a novel, symbiotic relationship between these two tasks that I term the "circle of meaning''. Today's statistical machine translation (SMT) systems require high quality human translations for parameter tuning, in addition to large bi-texts for learning the translation units. This parameter tuning usually involves generating translations at different points in the parameter space and obtaining feedback against human-authored reference translations as to how good the translations. This feedback then dictates what point in the parameter space should be explored next. To measure this feedback, it is generally considered wise to have multiple (usually 4) reference translations to avoid unfair penalization of translation hypotheses which could easily happen given the large number of ways in which a sentence can be translated from one language to another. However, this reliance on multiple reference translations creates a problem since they are labor intensive and expensive to obtain. Therefore, most current MT datasets only contain a single reference. This leads to the problem of reference sparsity---the primary open problem that I address in this dissertation---one that has a serious effect on the SMT parameter tuning process. Bannard and Callison-Burch (2005) were the first to provide a practical connection between phrase-based statistical machine translation and paraphrase generation. However, their technique is restricted to generating phrasal paraphrases. I build upon their approach and augment a phrasal paraphrase extractor into a sentential paraphraser with extremely broad coverage. The novelty in this augmentation lies in the further strengthening of the connection between statistical machine translation and paraphrase generation; whereas Bannard and Callison-Burch only relied on SMT machinery to extract phrasal paraphrase rules and stopped there, I take it a few steps further and build a full English-to-English SMT system. This system can, as expected, ``translate'' any English input sentence into a new English sentence with the same degree of meaning preservation that exists in a bilingual SMT system. In fact, being a state-of-the-art SMT system, it is able to generate n-best "translations" for any given input sentence. This sentential paraphraser, built almost entirely from existing SMT machinery, represents the first 180 degrees of the circle of meaning. To complete the circle, I describe a novel connection in the other direction. I claim that the sentential paraphraser, once built in this fashion, can provide a solution to the reference sparsity problem and, hence, be used to improve the performance a bilingual SMT system. I discuss two different instantiations of the sentential paraphraser and show several results that provide empirical validation for this connection

CiteSeerX

Digital Repository at the University of Maryland

The Hiero Machine Translation System: Extensions, Evaluation, and Analysis

Author: Chiang David
Lopez Adam
Madnani Nitin
Monz Christof
Resnik Philip
Subotin Michael
Publication venue
Publication date: 01/01/2005
Field of study

Hierarchical organization is a well known property of language, and yet the notion of hierarchical structure has been largely absent from the best performing machine translation systems in recent community-wide evaluations. In this paper, we discuss a new hierarchical phrase-based statistical machine translation system (Chiang, 2005), presenting recent extensions to the original proposal, new evaluation results in a community-wide evaluation, and a novel technique for fine-grained comparative analysis of MT systems.

CiteSeerX

Edinburgh Research Explorer

Measuring Variability in Sentence Ordering for News Summarization

Author: Ayan Necip Fazil
Conroy John M.
Dorr Bonnie J.
Klavans Judith L.
Madnani Nitin
O'Leary Dianne P.
Passonneau Rebecca
Schlesinger Judith D.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2007
Field of study

The issue of sentence ordering is an important one for natural language tasks such as multi-document summarization, yet there has not been a quantitative exploration of the range of acceptable sentence orderings for short texts. We present results of a sentence reordering experiment with three experimental conditions. Our findings indicate a very high degree of variability in the orderings that the eighteen subjects produce. In addition, the variability of reorderings is significantly greater when the initial ordering seen by subjects is different from the original summary. We conclude that evaluation of sentence ordering should use multiple reference orderings. Our evaluation presents several metrics that might prove useful in assessing against multiple references. We conclude with a deeper set of questions: (a) what sorts of independent assessments of quality of the different reference orderings could be made and (b) whether a large enough test set would obviate the need for such independent means of quality assessment

Crossref

Columbia University Academic Commons

Argument Mining:A Survey

Author: Abbott Rob
Adam Wyner
Ailomaa Marita
Anand Pranav
Aristotle
Aristotle
Athar Awais
Bex Floris
Bex Floris
Bosc Tom
Budzynska Katarzyna
Budzynska Katarzyna
Bunt Harry
Cabrio Elena
Carletta Jean
Carstens Lucas
Chris Reed
Cialdini Robert B.
Das Sanjiv
Delmonte Roldolfo
Dubremetz Marie
Duthie Rory
Egawa Ryo
Fahnestock J.
Feng Vanessa Wei
Feng Vanessa Wei
Gawryjolek Jakub
Grennan Wayne
Groarke Leo
Grosse Kathrin
Grosz Barbara J.
Hamblin C. L.
Hidey Christopher
Hirschberg Julia
Hobbs Jerry R
Hoeken Hans
Hua Xinyu
Ide Nancy
Janier Mathilde
John Lawrence
Kienpointner Manfred
Krauthoff Tobias
Krauwer Steven
Lasnik Howard
Lawrence John
Lawrence John
Lawrence John
Lawrence John
Levy Ran
Liu Bing
Madnani Nitin
Mann William C.
Metzinger Thomas
Musi Elena
Pallotta Vincenzo
Pang Bo
Park Joonsuk
Peldszus Andreas
Perelman Chaïm
Piao Scott
Pollock John
Pollock John L
Reed Chris
Rienks Rutger
Robertson David
Snaith Mark
Stab Christian
Stede Manfred
Toulmin Stephen E
van Eemeren Frans H.
van Rijsbergen Cornelis Joost
Villalba Maria Paz G.
Visser Jacky
Wachsmuth Henning
Walker Marilyn A.
Walton Douglas
Walton Douglas
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2020
Field of study

Crossref

University of Dundee Online Publications